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Abstract: We argue that the modern computer science technique of computa- 
tional complexity analysis can provide powerful insights into the algorithm-neutral 
analysis of information-processing tasks. In particular, we show that a simple, 
theory-neutral linguistic model of syntactic agreement and lexical ambiguity demon- 
strates that natural language parsing may be computationally intractable, extend- 
ing the classic work of Chomsky and Miller (1963). Significantly, we show that 
it may be syntactic features rather than complex rules that can cause this diffi- 
culty. Informally, human languages and the computationally intractable satisfiabil- 
ity problem (SAT) share two costly computational mechanisms: both enforce agree- 
ment among terminal symbols across unbounded distances and both allow terminal 
symbol ambiguity. In natural languages, lexical elements may be required to agree 
(or disagree) on such features as person, number, and gender (e.g., subject/verb 
agreement in English); in SAT, agreement ensures the consistency of variable truth 
assignments. Lexical ambiguity can appear freely in natural language utterances 
(can may be a noun, verb, or auxiliary), while a variable in a SAT formula may be 
either true or false. When coupled with a deterministic performance model, this 
complexity result explains a subtle psycholinguistic distinction between discover- 
ing and verifying the grammatically of an utterance. Finally, the applicability 
of computational complexity analysis to other cognitive faculties such as vision is 
discussed. 
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1 Introduction 

What is language? On one account, it is our ability to pair sound and 
meaning, ultimately, an information- processing task. In this paper, we argue 
that modern computational complexity theory can provide powerful insights 
into the structure of this problem by providing an algorithm-neutral analysis 
of information-processing structure. 

Specifically we show two things. First, contrary to what is commonly 
assumed, most, perhaps all, natural languages are not easy to parse: some 
grammatical sentences are too complex to be understood by person or ma- 
chine. Second, computational complexity theory's distinction between the 
difficulty of finding a solution and verifying a solution has a precise ana- 
log in the domain of natural language processing. In brief, we demonstrate 
formally that sentences combining syntactic agreement with syntactically 
ambiguous words can quickly become too difficult to parse, although their 
well-formedness may be easily verified once a paraphrased "solution" is pro- 
vided. Since possibly all natural languages exhibit ambiguity and agreement 
such as subject-verb agreement and noun/verb homophones like block in 
English (see section 4), this result provides a robust, modern counterpart 
to Miller and Chomsky's classic distinction between abstract knowledge of 
language — linguistic competence — and how that knowledge is put to use — 
performance. Our result moves beyond Miller and Chomsky's in four ways: 
its application of computational complexity theory; its formal specification 
of the syntactic phenomena of agreement and ambiguity as a precise model 
that we call agreement grammars; its prediction of a specific class of sen- 
tences that are difficult to analyze but easy to check for well-formedness 
in retrospect, as a consequence of a sentence processor's purely determin- 
istic operation rather than simply its finite characterization; and its broad 
applicability to most, perhaps all, natural languages. 

The remainder of this paper is organized as follows. Section 2 outlines 
our approach to applying complexity theory in the language processing do- 
main, reviewing the essential terminology of computational complexity the- 
ory that will be used in the sequel. Section 3 formalizes the purely syntactic 
phenomena of agreement and ambiguity in terms of agreement grammars. It 
then outlines a proof that any natural language containing syntactically am- 
biguous elements and agreement constraints will contain sentences that are 
computationally intractable to parse. Section 4 discusses the implications 
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of this result, indicating how the distinction between solution and verifica- 
tion is reflected in human sentence processing. An appendix provides formal 
details of our proofs. 

2 Complexity Theory and Psychological Models 

Following Marr (1980), we assume the scientific explanation of any complex 
biological information-processing system demands at least three distinct the- 
oretical levels: (1) a computational theory, explaining what is computed and 
why, including algorithm-neutral representations for the input and output 
of the process; (2) an algorithmic theory that can account for the transfor- 
mation of input to output; and (3) a (hardware) implementation theory, or 
the device in which the representation and algorithm are physically real- 
ized. Accordingly, the study of linguistic knowledge divides into the study 
of competence and performance. A theory of competence corresponds to 
Marr's topmost level of computational theory, explaining what information 
structures are computed and why, while abstracting away from algorith- 
mic details, memory limitations, shifts of attention or interest, and errors. 
Marr's remaining levels belong to the theory of performance, that propose a 
representation, algorithm, implementation triple to account for actual lan- 
guage use. 

Once we understand the topmost of Marr's levels — the computational 
theory of an information-processing problem — we can understand more about 
the other levels as well: 

Although algorithms and mechanisms are empirically more ac- 
cessible, it is the top level, the level of computational theory, 
which is critically important from an information-processing point 
of view. The reason for this is that the nature of the computa- 
tions that underlie perception depends more upon the computa- 
tional problems that have to be solved than upon the particular 
hardware in which their solutions are implemented. To phrase 
the matter another way, an algorithm is likely to be understood 
more readily by understanding the nature of the problem being 
solved than by examining the mechanism (and the hardware) in 
which it is embodied. (Marr, 1980, p. 27) 
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What then is the role of complexity theory in scientific explanation? 
Computational complexity theory measures the intrinsic difficulty of solving 
an (information-processing) problem no matter how its solution is obtained, 
for example, the problem of arranging a list of n names into alphabetic order. 
Inherently then, complexity theory studies problem structure: it classifies 
problems according to the amount of computational resources (for example, 
time or space) needed to solve them on some abstract computer model, 
typically a deterministic Turing machine. Complexity classifications are 
invariant across a wide range of primitive machine models, all choices of 
representation, algorithm, and actual implementation, and even the resource 
measure itself. 

It is important to see how powerful this invariance is. Any change in 
the problem representation that preserves the essential features of the orig- 
inal problem (preserving solutions to the original problem, in effect, its de- 
scriptive adequacy) can have no effect on its complexity classification. The 
robustness of these classifications makes complexity theory ideally suited 
for studying cognition: while we do know something about the abstract 
problems the brain solves, we do not know much about the representations, 
algorithms, or hardware involved. "If we believe that the aim of information- 
processing studies is to formulate and understand particular information- 
processing problems, then the structure of those problems is central . . . ." 
(Marr, 1980, p. 347). 

The two complexity classes we distinguish below are V and ft/V. V 
is the natural and important class of problems solvable in deterministic 
polynomial time, that is, on a deterministic Turing machine in time rt? for 
some integer j, where n denotes the size of the problem to be solved. 1 V 
is considered to be the class of problems that can be solved efficiently. For 
example, sorting takes n • log n time in the worst case using a variety of 
algorithms, and therefore is efficiently solvable. 

AfV is the class of all problems solvable in A/ondeterministic 'Polynomial 
time in the worst case. Informally, a problem is in MV if one can guess 
an answer to the problem and then verify its correctness in polynomial 
time. Such problems have no known polynomial-time (efficient) solution 
algorithms. For example, the problem of deciding whether a whole number 
i is composite is in ft/V because it can be solved by guessing a pair of 



'Problems must be encoded in a "reasonable" way for a size measure to make sense; 
for discussion, see Garey and Johnson (1979). 
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potential divisors, and then quickly checking if their product equals i. 

A problem T is NP-hard if it is at least as hard computationally as 
any problem in the class JsfV: if we had a subroutine that solved T in 
polynomial time, then we could write a program to solve any problem in 
MV in polynomial time (essentially by efficiently transforming the problem 
in AfV to T and then solving T with the fast subroutine: the appendix gives 
a more detailed account of how this procedure, known as problem reduction, 
works). Note that T need not be in AfV to be NP-hard. A problem is 
NP-complete if it is both in AfV and NP-hard. 

NP-complete problems can be solved only by methods too slow for even 
the fastest computers. 2 Since it is widely believed, though not yet proved, 
that no faster methods of solution can ever be found for these problems, NP- 
complete problems are considered computationally intractable. A famous 
NP-complete problem is the traveling salesman problem: to find the shortest 
route for a traveling salesman who must visit a number of cities and return 
to the city started at. For additional details, the reader may refer to Garey 
and Johnson (1979); Lewis and Papadimitriou (1978); or Barton, Berwick, 
and Ristad (1987). This last work explores further the relationship between 
computational complexity and natural language. 

3 Modeling agreement and ambiguity 

Having reviewed the basic terminology of complexity theory, we now turn 
to problem of formally modeling agreement and ambiguity in natural lan- 
guages. 

Syntactic agreement and ambiguity are widespread in human languages. 
Agreement can be morphological (word based) or structural, and can hold 
across unbounded distances and among unlimited sets of elements. This 
is quite easy to demonstrate. For example, in nearly all languages, predi- 
cates must agree with their arguments. In English, morphological agreement 
includes subject-verb agreement on person, number, gender, animacy, hu- 
manity, abstractness, quantity, and other features. Agreement occurs at the 
intra-morpheme level in some languages, for example, in Turkish where a 



3 However, some NP-complete problems have good average-time behavior, that is, the 
instances that occur most often can be efficiently solved. We discuss such behavior in 
relation to our NP-completeness result below in section 4. 
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suffix ending such as u forces agreement in vowel qualities with preceding 
vowels; this phenomenon occurs widely in such diverse languages as Finnish, 
Arabic, Hebrew, and the Australian language Warlpiri. 

Case marking is another form of agreement that surfaces both morpho- 
logically and syntactically in natural languages. Noun forms such as epi- 
thets, pronouns, and anaphora may be required to agree or disagree with 
other noun forms in person, number, gender, and so forth, as in, Reagan, 
the fool, believed he could appoint justices himself. Typically such agreement 
can occur over an unbounded number of words or phrases. In short, syntac- 
tic agreement is a widespread phenomenon of natural languages generally, 
perhaps found in all natural languages. 

Ambiguity is equally common in natural languages. Syntactic homonyms 
are typical: in English, the word block may be a noun or a verb. Ambiguity 
in quantifier scope and reference are equally common, for example, the dual 
meaning of Everyone loves someone. 

Descriptively adequate linguistic theories must therefore describe these 
two phenomena, and, in fact, all major linguistic theories do so, using three 
devices: (1) distinctive features to represent the dimensions of agreement; 
(2) an agreement enforcement mechanism; and (3) provision for lexical and 
structural (syntactic) ambiguity. 

While different theories work out the details of these three devices in dif- 
ferent ways, one can abstract away from these variations in order to model 
just agreement and ambiguity and study their computational complexity. 
We introduce agreement grammars here as a simple formal linguistic model 
with exactly these three devices. Agreement grammars are not natural lan- 
guage grammars. For one thing, agreement grammars are too simple to 
completely model any natural language. They can also generate infinitely 
many unnatural languages, such as £*, or any finite, regular, or context-free 
language. But while our results apply only to this simplified formal model 
of agreement and ambiguity, it is nonetheless true that all current linguistic 
theories that attempt to describe natural grammars readily embed the agree- 
ment grammar problem (see Barton, Berwick, and Ristad 1987). In this re- 
spect, our approach follows that of Kirousis and Papadimitriou (1985), who 
study the complexity of a formal model of scene recognition known as line- 
labeling. While line-labeling is not the same as scene recognition, it may be 
construed as a simple formal model embedded in the full-scale problem of 
scene recognition. 
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We begin with an informal introduction, and follow that with a formal 
specification of agreement grammars. 

3.1 Defining agreement grammars 

Following conventional notation, we first recall that a context-free grammar 
G is a 4-tuple, 

G=(V N ,V T ,P,S) 

where Vjy is a finite set of nonterminal symbols, Vp a finite set of terminal 
symbols, P a finite set of productions of the form A — * 7, where A G Vjy 
and 7 € (Vjy U Vt)*, and 5 is a distinguished start symbol. If P contains a 
production A -> 7, then for any a,/3 G (V n U Vt)*, we write aA/3 =>• ay (3 
and say that aA/3 derives ay/3 with respect to G. We let =£■ be the reflexive 
transitive closure of ^-, dropping the clause "with respect to G" where it is 
understood from context. The language L{G) generated by a context-free 
grammar is the set of all strictly terminal strings that can be derived from 
the start symbol with respect to G, that is, 

L(G) = {x : x £ Vj and S 4> x} 

We extend context-free grammars to obtain agreement grammars (AGs) 
by adding nonterminals that are sets of features and by imposing an agree- 
ment condition on the derivation relation. 

A feature is a [feature-name feature-value] pair. For example, [PER 1] 
is a possible feature, denoting first-person. Some features may be designated 
agreement features, and required to match other features (see below). For 
instance, an AG nonterminal labeling the first person pronoun I could be 
written { [CAT I] , [PLU -] , [PER 1] }, while the singular verb sleeps could 
be labeled with the AG nonterminal features { [CAT V] , [PLU -] , [PER 3] }. 

More formally, we define the set of nonterminals in the following way. 
The set of nonterminals in an agreement grammar is characterized by a 
specification {F, A, p) where F is a finite set of feature names and A is a 
finite set of feature values, p is a function from feature names to permissible 
feature values; that is, p : F —* 2 A . (F,A,p) specifies a finite set Vjj 
of nonterminals, where a nonterminal may also be thought of as a partial 
function from feature-names to feature- values: 

v N = {Ce aW -. v/ e doh(C)[C(/) e P (f)}} 
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Here y< x > is the set of all partial functions from X to Y. DOH(C) is the 
domain of C, that is the set {x : 3y[{x,y) G C]}. A category C extends a 
category C (written C 3 C) if and only if V/ G DOM(C), [<?'(/) = C(/)], 
that is, C" is a superset of C. For example, the category { [PER 1] , [HUH 1] } 
extends the category { [PER 1] }. 

An agreement grammar (AG) is a 5-tuple, 

G = ((F,A,p},V T ,F A ,P,S) 

whose first element specifies a set Vn of syntactic categories and where Vp 
is a finite terminal alphabet. Fa is the set of agreement feature names, 
Fa C JP. S is the distinguished starting symbol, S G Vn. P is a finite set of 
the usual context-free productions, each member taking one of the forms: 

1. C — ► a, where C € Vn and a G Vr, 

2. Co — »• C\ . . . C n , where each d G Vn. 

No so-called null productions or epsilon transitions are permitted: each 
production must have at least one non-null element on its righthand side. 

To complete our definition, we modify the derives relation to incorporate 
agreement. We say that a production C — ► C[ . . . C' n extends a production 
Co — »• C\ . . . C„ if and only if C[ extends Cj for every i and the mother's 
agreement features appear on every daughter: 

1. Vi, < i < n, [C[ 3 d], and 

2. V/ G (DOH(C ) n F A ), Vi, 1 < i < n, [(/ G DOM(CJ)) A (C?(/) = C (/))] 

The last condition (the agreement convention) ensures that all agreement 
features on the mother are also found on all daughters. 

We may now define the language generated by an agreement grammar. 
If P contains a production A — ► 7 with an extension A' — > 7', then for any 
<*>/? G (Vn U Vt)*, we write aA'p =$> ery'/S. Let ^ be the reflexive transitive 
closure of ^> in the given grammar G. The language L(G) generated by G 
contains all terminal strings that can be derived from any extension of the 
start category: 

L(G) = { x : x G Vf and 35', [S' 3 S, and S' =* x]} 
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A Natural Language Example. The following artificial agreement gram- 
mar G\ models subject- verb agreement for person and number in English. 

1. G\ includes the set F of feature names {CAT, PLU, PER} and the function 
p defined by: 

p(CAT) = {S,VP,HP,V,H} 
p(PER) = {1,2,3} 
p(PLU) = {+,-} 

The start category S is {[CAT S]}, and the set of agreement fea- 
ture names F& = {PER,PLU}. The feature CAT encodes the syntactic 
category of the nonterminal (sentence, noun phrase, and so forth). 
PER encodes person (first, second, or third), and PLU encodes number 
( [PLU +] is plural, [PLU -] is singular). 

2. The terminal vocabulary of G\ is 

Vp = {I, men, John, sleep, sleeps}. 



3. G\ contains the following 9 productions: 



{[CAT S] 

{[CAT VP] 

{[CAT IP] 

{ [CAT MP] , [PLU -] , [PER 1] 

{ [CAT I] , [PLU +] 

{ [CAT HP] , [PLU -] , [PER 3] 

{ [CAT V] , [PLU +] 

{ [CAT V] , [PLU -] , [PER 1] 

{ [CAT V] , [PLU -] , [PER 3] 



{ [CAT IP] } { [CAT VP] } 

{[CAT V]} 

{[CAT I]} 

J 

men 

John 

sleep 

sleep 

sleeps 



The sample grammar generates exactly the following sentences: 

a. I sleep (={[CAT S],[PER 1],[PLU -]}) 

b. men sleep (= {[CAT S], [PLU +]}) 

c. John sleeps (= {[CAT S], [PER 3], [PLU -]}) 



We next turn to the computational complexity of recognizing sentences 
generated by an arbitrary agreement grammar. 
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3.2 The computational complexity of agreement grammar 
recognition 

Given an arbitrary agreement grammar, how hard is it to parse using the 
agreement features of that grammar? Computational complexity theory 
gives us a precise answer to this question. We may state the recognition 
problem for agreement grammars as follows: 

Given an arbitrary agreement grammar AG and a string x, is x £ 
L{AG)t 

This problem is NP-complete. Intuitively, feature agreement lets us "sim- 
ulate" the problem of finding out whether there exists an assignment of 
truth-values to variables that satisfies an arbitrary Boolean formula in 3- 
conjunctive normal form, that is, a formula such as this one 

{x V y V z) A (y V z V to) 

where there are exactly three disjoined variables per clause, and each clause 
is conjoined with the next. This problem is called 3SAT (for "three satisfia- 
bility"); the appendix provides a formal definition of this problem. 3 Feature 
agreement simulates the assignment of truth- values: if y is given the value 
true in one clause, then it must be true in all other clauses (and y must be 
false). Syntactic category ambiguity simulates the fact that we must "guess" 
whether y is to have the value true or false, just as we must sometimes guess 
whether block is a noun or a verb. Finally, ordinary context-free produc- 
tions may be used to guarantee that there is at least one true variable per 
clause, as is demanded for there to be a satisfying truth-assignment. This 
simulation, formally called a reduction, establishes that AG recognition is 
NP-hard. To establish inclusion in MV we use the impossibility of null- 
transitions in AGs to derive a polynomial bound on the length of a shortest 
derivation. Given this, it is easy to show that a nondeterministic program 
can "guess" membership of a; in L(AG) in polynomial time. The proof fol- 
lows that in Barton, Berwick, and Ristad (1987) and is spelled out in the 
appendix. 

It is important to note that this complexity result is a function of both 
input sentence length and grammar size. At first glance, this might seem 

'The possibility that the agreement grammar recognition problem might be NP- 
complete and a general idea of how to prove it arose out of a discussion between the 
authors and E. Barton. 
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unreasonable. A child learning a language might be able to discover a more 
compact, highly efficient grammar to use. Similarly, people appear to use 
one grammar to process sentences, not a family of grammars. If the gram- 
mar were fixed, then it would not be part of the input to the problem, and 
a polynomial time recognition algorithm might exist. 

But factoring out grammar size has many problems, as discussed in 
Barton, Berwick, and Ristad (1987). To summarize these: (1) Complexity 
analysis should consider all relevant inputs; grammar size is an important, 
direct component of recognition algorithms, and therefore it is wrong to 
ignore this dominant element of recognition time. This is especially true 
for natural languages, where grammar size is much larger than expected 
sentence length (typically by a factor of 10 3 or more). (2) Known pre- 
processing steps for agreement grammars all fail, because they expand the 
grammar size exponentially, which acts as a huge constant factor of 2' G ' 
multiplying the recognition time. For example, a full grammar with 10,000 
rules could require time 2 10000 • n 3 to parse — polynomial time in a strict 
sense, but impossibly long in practical terms. 

What this NP-completeness result means is that there is no known algo- 
rithm for determining membership in the language of an arbitrary agreement 
grammar that does not in effect exhaustively check an exponential number 
of possible feature combinations. Further, there is no known reasonable rep- 
resentational recasting of the AG recognition problem that would do better. 
Interestingly, this NP-completeness result does not rely on the context-free 
power of the AG model. The agreement grammar used in the reduction gen- 
erates a regular language, and essentially the same reduction would apply 
to an agreement grammar whose language was finite. The reduction relies 
only on the combinatorial possibilities that arise from nonlocal agreement 
and ambiguity. 

Put another way, natural languages that incorporate the minimal ma- 
chinery of agreement and ambiguity are inherently asymptotically intractable. 
This intractability arises from the interaction of agreement and ambiguity. 
Informally, human languages and the NP- complete satisfiability problem 
(SAT) share two costly computational mechanisms: both enforce agreement 
among terminal symbols across unbounded distances and both allow ter- 
minal symbol ambiguity. In natural language, lexical elements may be re- 
quired to agree (or disagree) on such features as person, number, gender, 
case, count, category, reference, thematic role, tense, and abstractness (sub- 
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ject/verb agreement in English, for example); in SAT, agreement ensures 
the consistency of variable truth assignments. Lexical ambiguity can ap- 
pear freely in natural language utterances (is can a noun, verb, or auxiliary 
verb?), while a variable in a SAT formula may be either true or false. Thus, 
the linguistic mechanisms for agreement and ambiguity are exactly those 
needed to simulate Satisfiability — any linguistic theory that uses them, as 
any descriptively adequate theory must, will be computationally intractable. 

4 Intractability and linguistic performance 

Having established the inherent computational intractability of descriptively 
adequate linguistic theories, we turn next to the implications of this result 
for models of human sentence processing. We show that the fundamental 
difference between finding and verifying a result surfaces in the agreement 
grammar case, and in the associated natural language examples. 

Following Miller and Chomsky (1963), let us imagine a linguistic per- 
formance model M (a "parser") that is fundamentally deterministic and 
assigns structural descriptions to utterances in real time. Refining their dis- 
cussion, by "deterministic" we mean that M may have limited parallelism 
and cannot guess correct answers. This machine model is thought to in- 
clude all physically realizable computing machines, from the fastest digital 
computers to the brain. 

A consequence of the NP- completeness results of the previous section is 
that M will not be able to analyze certain constructions involving both ambi- 
guity and long-distance agreement. This result does not dispute that short, 
unambiguous, or structurally simple utterances can be processed efficiently. 
More importantly, given the apparent speed of ordinary language use, the 
result suggests that actual biological recognizers may be both fast and oc- 
casionally inaccurate. M will, however, be able to efficiently "verify" (in 
a sense to be clarified below) many of the constructions it fails to analyze. 
The choice of a deterministic performance model, when coupled with the 
AG model of competence, indicates that some performance limitations will 
arise out of the deterministic nature of processing (see Berwick and Wein- 
berg, 1982) rather than from the finite nature of human cognitive capacity 

4 Thus, the fact that many NP-complete problems have good "average time" solutions 
does not contradict our result. In fact, given the preponderant distribution of short 
utterances, it reinforces our result, as the discussion below makes clear. 
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(see Miller and Chomsky, 1963). 

As evidence of such a performance limitation, consider examples that 
exhibit excessive lexical and structural ambiguity, as in sentence (1) below, 
where buffalo can be one or many shaggy beasts, a city, or a transitive 
verb that means fool. The sentences in (2) demonstrate the same effect 
with the elaborate agreement processes found in consecutive constituent 
coordination, discontinuous constituent coordination, rightward movement 
out of coordinate structures, and gapping. Sentence (1) has an array of 
possible interpretations, ranging from the simple interpretation suggested by 
the parallel sentence "Boston buffalo fool Boston buffalo" to more elaborate 
ones with relative clauses, for example "[Buffalo that buffalo fool] can fool 
buffalo." 6 

buffalo buffalo buffalo buffalo buffalo (1) 

a. John owned and then sold hundreds of late model cars 
to us that he waxed all the time. 

b. John liked and wanted to tease Sue and Bill, Mary. 

c. John owned and then sold hundreds of late model cars \*> 
to us and Bill, trucks. 

d. John owned and then sold hundreds of late model cars 
to us and to Bill, trucks. 

Examples combining the two phenomena become even worse: Buffalo 
buffalo buffalo and buffalo buffalo buffalo of buffalo buffalo to buffalo buffalo 
buffalo buffalo and buffalo, buffalo buffalo. 

Linguistic agreement and ambiguity may cause intractability in other 
languages as well. Free word order languages such as Warlpiri, a central 
Australian aborigine language, have special morphology for verbs and for 
nominal arguments that make sentences such as the buffalo examples easy 
to understand when they are directly translated. But the morphological 
processes in these languages typically allow other highly ambiguous con- 
structions that are difficult to understand. For example, Warlpiri fails to 
distinguish adjectives and nouns either morphologically or configurationally 
(as in English), making the direct translation of such trivial English sen- 
tences as John flushed the Air Force space shuttle toilet computationally 



5 Equivalent sentences can be constructed out of any woid whose plural noun form is 
morphologically identical to its plural verb form: police police police police police . . ., and 
french french french french french, etc. 
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analogous to the intractable "buffalo" sentences of English. We conclude 
that there is, in fact, a class of grammatical sentences whose recognition 
complexity can grow exponentially faster than their length, and therefore, 
contrary to common belief, natural language may not be efficiently parsable 
in general. 

Significantly, the preceding natural language examples have the compu- 
tational character of NP-complete problems: solutions may be hard to find, 
but they are easy to verify. This is a nontrivial result because there is no 
a priori reason why solutions to a problem should be easy verify. Thus, if 
full natural language understanding was harder than NP-complete, as has 
been suggested by Chomsky (1980), then some grammatical sentences could 
never be understood, even with extensive priming and prompting. 

The reader's first attempt to understand the buffalo sentence is likely 
to fail completely. However, it is generally easy to check the paraphrased 
"solution." The curious nature of this phenomenon confirms the predictions 
made by the model M, since it is a property of a deterministic machine 
operating under polynomial time constraints that it will be unable to find 
an analysis of the agreement-type sentences. On the other hand, M should 
be able to verify an agreement sentence, since such NP-hard problems are, 
by definition, verifiable in polynomial time by a deterministic machine. This 
result also argues that neither agreement nor ambiguity should be bounded 
by the competence model. 

An explanation of the psycholinguistic dichotomy between solving and 
verifying relies critically on the competence/performance distinction. The 
possibility of understanding (verifying) the utterances at all is explained by 
a competence theory that does not bound agreement or ambiguity. On the 
other hand, the difficulty of understanding (solving) the utterances is best 
explained by the deterministic nature of the performance model. 

Agreement and ambiguity, if permitted to operate without bound in 
the speaker, will quickly generate utterances that exceed the (deterministic) 
perceptual capabilities of hearers. These sentences, being too difficult for 
the hearer to understand, will not be used due to the fidelity criterion of 
communication systems (see Chomsky and Miller, 1963, p. 273). The fidelity 
criterion states that the receiver establishes the criterion of acceptability of 
a communication system: if the receiver cannot process a signal, then the 
fidelity of the communication channel is wasted. Simply put, unacceptably 
ambiguous sentences, being difficult for the hearer, are not used in practice, 
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"just as many other proliferations of syntactic devices that produce well- 
formed sentences will never actually be found," (Miller and Chomsky, 1963, 
p. 471). 

Nearly all utterances evince both agreement processes and ambiguity, 
to varying degrees. Therefore, there is no reason to expect that occasional 
unacceptability introduced by excessive agreement and ambiguity will cause 
those processes to disappear from language over the course of time. In fact, 
all known natural languages employ these mechanisms. It would be rea- 
sonable to expect, however, that natural language processing system might 
develop techniques to efficiently process the "easy" cases and approximately 
process the "hard" ones. 

It remains to develop a complete theory of approximate processing for 
hard problems, but complexity theory again suggests some possible an- 
swers. One approach is what we advance above: hard sentences are not 
in fact solved, but only verified for grammaticality upon paraphrase. An- 
other approach is to simply restrict the domain of problems solved: only 
short agreement sentences will be analyzed, and analysis of those exceeding 
a set resource limit will be aborted. 

More generally, on this analysis, whenever the computational cost of a 
task matches its observed cognitive cost, we know that scientific explanation 
of the task should occur primarily in a theory of competence and that the 
performance theory is likely to be straightforward: that is, deterministic 
and faithful. But whenever the inherent computational cost differs from 
measured cognitive cost, complexity theory yields specific insight into the 
performance theory: what needs to be explained at that level and the form 
such an explanation might take. 

If complexity theory classifies a cognitive problem as intractable, yet 
humans appear to solve that problem efficiently, this suggests that the per- 
formance algorithm restricts its input domain or solves costly instances only 
approximately (as in simulated annealing; see Kirkpatrick, Gelatt, and Vec- 
chi, 1983 for further discussion), or perhaps aided by parallel hardware spe- 
cially designed for the cognitive problem at hand. 

On the other hand, a problem could be easy in principle, yet impossi- 
ble for people to solve. Then the performance algorithm might be simple- 
minded, inefficient, or quite restricted (as with the "no reentrant procedures" 
constraint of an early performance model), or the mental hardware might 
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limit memory use or processing time. 

To take a simple example from another cognitive domain, Kirousis and 
Papadimitriou (1985) consider the complexity of the historically important 
line-labeling problem in machine vision. They prove that the line-labeling 
problem and the more general scene recognition problem are NP- complete, 
and likely to be intractable. Given the apparent speed with which humans 
recognize scenes, and hence the surprising nature of their result, they suggest 
that computationally difficult scenes are scarce in practice, or that real- world 
hints (for example, surface texture and assorted depth clues) might simplify 
the real- world scene recognition problem. 

In either situation then, the complexity analysis of information-processing 
tasks can lead to significant conclusions about linguistic performance be- 
cause complexity theory makes strong empirical predictions. Agreement 
grammars provide a linguistically and algorithmically neutral model for 
agreement and ambiguity in natural languages. Agreement grammar recog- 
nition is theoretically intractable. We have also observed that the buffalo- 
type sentences are difficult for humans to process. In order to explain this 
apparent match between predicted intractability and observed cognitive dif- 
ficulty, we are led to postulate a deterministic processing model for English, 
and perhaps all natural languages. 
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A Formal Results 

In this appendix we give additional details on the proof technique of reduc- 
tion, followed by a formal proof of the NP-completeness result sketched in 
the main text. 

A.l Reduction as a proof technique 

Complexity classifications are established with the proof technique of reduc- 
tion. A reduction converts instances of a problem T of known complexity 
into instances of a problem S whose complexity we wish to determine. The 
reduction operates in polynomial time. Therefore, if we had a polynomial 
time algorithm for solving S, then we could also solve T in polynomial time, 
simply by converting instances of T into S. (This follows because the compo- 
sition of two polynomial time functions is also polynomial time.) Formally, if 
we choose T to be NP-complete, then the polynomial time reduction shows 
that S is at least as hard as T, or NP-hard. If we were also to prove that 5 
was in AfV, then S would be NP-complete. 

In this case, the known NP-complete problem T that we will use is 3SAT, 
and the problem S of unknown complexity is AG-Recognition. Therefore, 
the proof will reduce instances of 3SAT (a 3-conjunctive normal form or 
3-CNF Boolean formula F) into instances of AG-Recognition (an AG G and 
input string a;). The 3-Satisfiability problem (3SAT) is to determine, given 
a Boolean expression in 3-CNF, whether the formula is satisfiable. 3SAT is 
NP-complete. An example of a satisfiable 3-CNF Boolean formula with five 
clauses is: 

(a V 6 V c) A (a V d V e) A (e V d V c) A (6 V c V d) A (a V d V e) 

A Boolean expression is an expression composed of variables (e.g. x), paren- 
theses, and the logical operators V (OR), A (AND), and negation. Negation 
is represented as a horizontal bar over the negated expression (e.g. x is the 
negation of the variable a;). A literal is a variable or the negation of a vari- 
able. Variables may have the values (false) and 1 (true), as do expressions. 
An expression is satisfiable if there is some assignment of O's and l's to the 
variables that gives the expression the value 1. 

A Boolean expression is in conjunctive normal form (CNF) if it is of the 
form E\AE2h--'t\Ek and each clause E{ is of the form an V an V • • • V aj m< , 
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where each a<j is a literal — either a variable a; or a negated variable "$. An 
expression is in 3-CNF if each clause in the CNF expression contains exactly 
three distinct literals. 

A. 2 AG Recognition is NP-complete 

Lemma A.l Let (<p ,...,(pk) be a shortest leftmost derivation of <pk from 
<po in an agreement grammar G containing at least one branching production. 
If k > \P\, where P is the set of productions in G, then \<pk\ > IVo|« 

Proof. In the step <fi => <fi+i, where (pi = a A' (3 and ipi+i = aj'fl for 
a € Vp, /3 6 (Vt U K)*, one of the following cases must hold: 

1. The production A — ► 7 with extension A' -* 7' is nonbranching 
(I7I = 1). In the worst case, we could cycle through every possible 
nonbranching production (without using a branching production), af- 
ter which we would begin to reuse them. Any extension of a production 
that has already been used in this run of nonbranching productions 
could have been guessed previously, and the length of the shortest 
nonbranching run must be less than \P\. 

2. The production A -> 7 with extension A' -* 7' is branching (I7I > 1). 
Then \<fii\ > |v»'+i|- 

A total of at most n— 1 branching productions derives an utterance of length 
n, because there are no null-transitions in an agreement grammar. Each 
branching production can be separated from the closest other branching 
production in the derivation by a run of at most |G| nonbranching produc- 
tions, and the shortest derivation of x will be of length 0(\G\ • \x\). (As is 
conventional in computer science, the expression 0(a:) stands for "exactly 
x".) 

Theorem 1 Agreement grammar recognition is in MV . 



6 If the agreement grammar G does not contain a branching production, then L(G) 
contains only strings of length one and all shortest derivations are shorter than \P\: mem- 
bership for such a grammar is clearly in ft/V. 
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Proof. On input agreement grammar G and input string x € Vr *> guess a 
derivation of x in nondeterministic polynomial time as follows. 7 

1. Guess an extension 5" of S, and let S' be the derivation string. 

2. For a derivation string aA'/3, where a € V"r,/3 6 (Vr U K)*, guess a 
production A — > 7 and extension j4' — > 7' of it. Let ary'P be the new 
derivation string. 

3. If a^'P = x, accept. 

4. If \aj'/3\ > \x\, reject. 

5. Loop to step 2 (at most \G\ • \x\ times). 

Every loop of the nondeterministic algorithm performs one step in the 
derivation. By lemma A.l, the shortest derivation of x is at most of length 
9(\G\ ' \x\), so we need to loop through the algorithm at most that many 
times. Guessing an extension of a category may be performed in time 8( \F\), 
and an extension of a production may be guessed in time 0(|.F| • |P|). This 
nondeterministic algorithm runs in polynomial time and accepts exactly 
L(G); hence AG Recognition is in MV. \^\ 

Theorem 2 AG Recognition is NP-hard. 

Proof. We reduce 3SAT to AG Recognition in polynomial time. Given a 
3CNF formula / of length m using the n variables q\ . . . q n , we construct an 
agreement grammar Gj such that the string w is an element of L(Gf) iff / 
is satisfiable, where w is the string of formula literals in /. Gf is constructed 
as follows: 

1. Gf includes the set F of feature names { STAGE, LITERAL, ^, ... ,<?„} 
with values defined by the function p: 

p(STAGE) = {l,...,n + 3} 
/>(LITERAL) = {+,-} 
Piii) = {0,1} 



7 Again, we assume G contains at least one branching production. If not, then we 
should only loop as many times as there are productions, and then halt. 
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The grammar will assign truth- values to the variables and check satis- 
faction in n+ 3 stages as synchronized by the feature STAGE. The start 
category is {[STAGE i]}. 

2. At each of the first n stages, a value is chosen for one variable; because 
the qi are declared as agreement features, the values that are chosen 
will be maintained throughout the derivation tree. The following In 
nonbranching rules are needed, constructed for all i, 1 < i < n. 

{[STAGE *],[«; 0]} -► {[STAGE i+l],[ft 0]} 
{[STAGE $],[» 1]} -> {[STAGE i+l],[(fc 1]} 

Note that square brackets ([,] ) delimit features, while curly brackets 
({,}) delimit the sets of features that form nonterminals. 

3. At stage n + 1, the grammar has guessed truth assignments for all 
variables; all that remains is to use the truth assignments to generate 
satisfied three-literal clauses. The following two rules generate enough 
clauses to match the number of clauses in w: 

{ [STAGE n + 1] } -» { [STAGE n + 2] } 

{ [STAGE n + 1] } -► { [STAGE n + 1] } { [STAGE n + 2] } 

4. At stage n + 2, the grammar generates satisfied three-literal clauses — 
clauses containing at least one true literal. Let Co and d be the 
following categories: 

C = { [STAGE n + 3] , [LITERAL -] } 
d = { [STAGE n + 3] , [LITERAL +] } 

Then the following 7 ternary-branching rules are needed; any set of 
three literals makes the clause true, provided at least one literal is 

true: 

{ [STAGE n + 2] } -► C Co Ci 

{ [STAGE n + 2] } -► C C\ C 

{ [STAGE n + 2] } -► Ci Co Co 

{ [STAGE n + 2] } -► C C x C\ 

{ [STAGE n + 2] } -► d C Ci 

{ [STAGE n + 2] } -► C t C x C 

{ [STAGE n + 2] } -► d d C x 
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5. Finally, lexical insertion at stage n + 3 ties together the truth- values 
chosen for the variables and the literals. For every ft, 1 < t < n, we 
need the following four nonbranching rules, bringing us to a total of 
6n + 9 rules: 

{ [STAGE n + 3] , [LITERAL +] , [ft 1] } -» ft 

{ [STAGE n + 3] , [LITERAL -] , [ft 0] } -► ft 

{ [STAGE n + 3] , [LITERAL +] , [ft 0] } -*■ ft 

{ [STAGE n + 3] , [LITERAL -] , [ft i] } -» ft 

If some extension of the start category S = [STAGE 1] can be generated, 
then the formula / is satisfiable; each extension of the start category that 
generates a string must encode a satisfying truth assignment. For example, 
the category 

{[STAGE l],[ft l],[ft> 0],. ..,[<?„ 1]} 

generates 3- CNF formulas / with the satisfying truth assignment ft = 
1, ft} = 0, ...,(/„ = 1. Note that the agreement grammar constructed in the 
reduction generates all satisfiable 3CNF Boolean formulas, of any length, 
using n or fewer variables. |~| 

Figure 1 contains a sample reduction, showing the parse tree needed to 
analyze a 3SAT instance recoded as an AG parsing problem. The 3SAT 
instance to solve is («i VujV U3) A («i V u 2 V U3), which has the satisfying 
assignment «i = 1, « 2 = 1 and u 3 = 0. The corresponding input string to 
be parsed is a formula of literals, uiU2«3«iU3U3- The agreement features are 
therefore {ui,U2,U3}. 
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Start] 



[u 1] 



[u 1] 
2 



[u 0] -I 



Guess variable 
truth assignments 



[GENERATE-SAT-3CLAUSES] 




[GENERATE-SAT-3CLAUSE] [GENERATE-SAT-3CLAUSES] 
[GENERATE- 100-3CLAUSE] [GENERATE-SAT-3CLAUSE] 




[GENERATE*Q10-3CLAUSE] 




[Literal 1] [Literal 0] [Literal 0] [Literal 0] [literal 1] [literal 0] 



Figure 1: Figure 1: A sample reduction for the agreement grammar proof, showing 
a 3SAT instance recoded as an AG parsing problem. The 3SAT problem to solve 
is is («i V ¥2 V Us) A (¥i VttjV «s), which has the satisfying assignment Ui = 1, 
u 2 = 1 and «s = 0. The corresponding input string to be parsed is a formula 
of literals, Ui«2«3«i«3W3. The agreement features are therefore {«i,U2,tts}- The 
solution expressed at the terminal leaves of the tree is «i = (i.e., true); «2 = 1 
(true); and «3 = (false). 
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